Xander: employing a novel method for efficient gene-targeted metagenomic assembly.

نویسندگان

  • Qiong Wang
  • Jordan A Fish
  • Mariah Gilman
  • Yanni Sun
  • C Titus Brown
  • James M Tiedje
  • James R Cole
چکیده

BACKGROUND Metagenomics can provide important insight into microbial communities. However, assembling metagenomic datasets has proven to be computationally challenging. Current methods often assemble only fragmented partial genes. RESULTS We present a novel method for targeting assembly of specific protein-coding genes. This method combines a de Bruijn graph, as used in standard assembly approaches, and a protein profile hidden Markov model (HMM) for the gene of interest, as used in standard annotation approaches. These are used to create a novel combined weighted assembly graph. Xander performs both assembly and annotation concomitantly using information incorporated in this graph. We demonstrate the utility of this approach by assembling contigs for one phylogenetic marker gene and for two functional marker genes, first on Human Microbiome Project (HMP)-defined community Illumina data and then on 21 rhizosphere soil metagenomic datasets from three different crops totaling over 800 Gbp of unassembled data. We compared our method to a recently published bulk metagenome assembly method and a recently published gene-targeted assembler and found our method produced more, longer, and higher quality gene sequences. CONCLUSION Xander combines gene assignment with the rapid assembly of full-length or near full-length functional genes from metagenomic data without requiring bulk assembly or post-processing to find genes of interest. HMMs used for assembly can be tailored to the targeted genes, allowing flexibility to improve annotation over generic annotation pipelines. This method is implemented as open source software and is available at https://github.com/rdpstaff/Xander_assembler.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Scalable and Accurate Targeted Gene Assembly Tool (SAT-Assembler) for Next-Generation Sequencing Data

Gene assembly, which recovers gene segments from short reads, is an important step in functional analysis of next-generation sequencing data. Lacking quality reference genomes, de novo assembly is commonly used for RNA-Seq data of non-model organisms and metagenomic data. However, heterogeneous sequence coverage caused by heterogeneous expression or species abundance, similarity between isoform...

متن کامل

SRL-coated PAMAM dendrimer nano-carrier for targeted gene delivery to the glioma cells and competitive inhibition by lactoferrin

Glioma, as a primary tumor of central nervous system, is the main cause of death in patients with brain cancer. Therefore, development of an efficient strategy for treatment of glioma is worthy. The aim of the current study was to develop a SRL peptide-coated dendrimer as a novel dual gene delivery system for targeting the LRP receptor, an up-regulated gene in both BBB and glioma cells. To perf...

متن کامل

k-SLAM: accurate and ultra-fast taxonomic classification and gene identification for large metagenomic data sets

k-SLAM is a highly efficient algorithm for the characterization of metagenomic data. Unlike other ultra-fast metagenomic classifiers, full sequence alignment is performed allowing for gene identification and variant calling in addition to accurate taxonomic classification. A k-mer based method provides greater taxonomic accuracy than other classifiers and a three orders of magnitude speed incre...

متن کامل

SRL-coated PAMAM dendrimer nano-carrier for targeted gene delivery to the glioma cells and competitive inhibition by lactoferrin

Glioma, as a primary tumor of central nervous system, is the main cause of death in patients with brain cancer. Therefore, development of an efficient strategy for treatment of glioma is worthy. The aim of the current study was to develop a SRL peptide-coated dendrimer as a novel dual gene delivery system for targeting the LRP receptor, an up-regulated gene in both BBB and glioma cells. To perf...

متن کامل

Revealing large metagenomic regions through long DNA fragment hybridization capture

BACKGROUND High-throughput DNA sequencing technologies have revolutionized genomic analysis, including the de novo assembly of whole genomes from single organisms or metagenomic samples. However, due to the limited capacity of short-read sequence data to assemble complex or low coverage regions, genomes are typically fragmented, leading to draft genomes with numerous underexplored large genomic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Microbiome

دوره 3  شماره 

صفحات  -

تاریخ انتشار 2015